Array-based enzymatic oligonucleotide synthesis

ABSTRACT

Array-based enzymatic oligonucleotide synthesis creates a large number of polynucleotides using an uncontrolled and template independent polymerase such as terminal deoxynucleotidyl transferase (TdT). Spatial control of reaction conditions on the surface of the array allows creation of polynucleotides with a variety of arbitrary sequences. Spatial control may be implemented by removing protecting groups attached to nucleotides only at a selected location on the array or by other techniques such as location-specific regulation of enzymatic activity. The ratio of polynucleotides with protecting groups to unprotected polynucleotides used during a cycle of synthesis is adjusted to control the length of homopolymers created by the polymerase. Digital information may be encoded in the enzymatically synthesized polynucleotides. An encoding scheme for representing digital information in a nucleotide sequence accounts for homopolymers in the polynucleotides by collapsing homopolymer strings in the sequence data to a single nucleotide or to a shorter homopolymer.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a division of U.S. patent application Ser. No.16/563,797, filed Sep. 6, 2019, entitled “ARRAY-BASED ENZYMATICOLIGONUCLEOTIDE SYNTHESIS,” the entire content of which is herebyexpressly incorporated by reference.

SEQUENCE LISTING

The content of the Sequence Listing XML of the sequence listing named“MS1-9390USD1_SequenceListingXML.xml” which is 4,864 bytes in size wascreated on Sep. 13, 2023, and electronically submitted is incorporatedherein by reference in its entirety.

BACKGROUND

Synthetic oligonucleotides, also referred to as polynucleotides, such asdeoxyribonucleic acid (DNA) and ribonucleic acid (RNA) have uses inmedicine, molecular biology, nanotechnology, data storage, as well asother applications. Enzymatic oligonucleotide synthesis has emerged asan alternative to the long-standing nucleoside phosphoramidite methodfor the synthesis of polynucleotides. Enzymatic synthesis is performedwith a template independent polymerase such as terminal deoxynucleotidetransferase (TdT) rather than a series of chemical reactions. Enzymaticpolynucleotide synthesis has advantages over the nucleosidephosphoramidite method because it is performed in an aqueous environmentand does not use toxic organic chemicals. Enzymatic synthesis also hasthe potential to create longer polynucleotides than the nucleosidephosphoramidite method.

However, enzymes such as TdT add nucleotides in an unregulated mannerthat creates variable length homopolymers as a result of adding the samenucleotide multiple times during a single synthesis cycle. With theestablished nucleoside phosphoramidite method, each synthesis cyclereliably adds only a single nucleotide. Thus, due to the potential forhomopolymer addition, it is difficult to create polynucleotides withspecific single-base sequences using enzymatic synthesis.

Some applications for synthetic polynucleotides such as data storage donot necessarily require single-base precision or accuracy. However, eventhough single-based position is not required, implementing data storagewith polynucleotides at scale will use a large number of polynucleotideswith different sequences to encode vast amounts of data. Therefore,techniques and systems for efficient and high throughput enzymaticsynthesis of polynucleotides are desirable. Techniques for encodingdigital information in polynucleotides that include homopolymers ofvariable, and potentially unknown length, are also desirable. Thisdisclosure is made with respect to these and other considerations.

SUMMARY

This disclosure provides methods and systems for array-based enzymaticsynthesis of polynucleotides. Spatially addressable control of reactionconditions on the surface of an array allows for synthesis of multiplepolynucleotides with different sequences. The spatial control may beprovided by removing protecting groups from nucleotides only at specificspots on the array. The spatial control may alternatively be provided bycontrolling polymerization activity so that the template independentpolymerase actively adds nucleotides only at specific spots on thearray. Changing the locations where polymerization is able to occurduring each cycle of synthesis allows the polynucleotides at differentspots on the surface of the array to be synthesized with differentnucleotide sequences.

Array-based synthesis of polynucleotides using template independentpolymerases overcomes limitations of previous enzymatic synthesistechniques that use beads in a test tube as a solid substrate. Allpolynucleotides synthesized in the same test tube, or anotherundifferentiated reaction chamber, will have the same sequence ofnucleotides. Systems that require a different, physically-isolatedreaction environment such as different test tubes to createpolynucleotides with different nucleotide sequences are difficult toscale and have limited throughput. Array-based synthesis providesaddressability and site-specific adaptation of reaction environments byusing a rigid or semi-rigid surface that is substantially flat as thesolid substrate for polynucleotide synthesis. This design providesmultiple separately adjustable reaction environments with a structurethat is more compact and requires less physical manipulation than acomparable system using beads and test tubes.

This disclosure also provides methods and systems for encoding digitalinformation in polynucleotides synthesized using a template independentpolymerase. Template independent polymerases such as TdT do not createpolynucleotides with specific base-by-base level of sequence control butrather add homopolymers of variable length. Although the order of whichnucleotides are incorporated can be controlled, it is difficult toprecisely control the number of nucleotides added during a single cycleof synthesis. Thus, many existing techniques for encoding digitalinformation in a sequence of nucleotide bases may be unable to decodepolynucleotides that include homopolymers of variable and unknownlength.

The encoding techniques described herein collapse homopolymers insequence strings down to a single nucleotide or to a shorterhomopolymer. If the original encoding scheme converts digitalinformation to a sequence of nucleotide bases without homopolymers, thenthe decoding will reduce any string of the same nucleotide to a singleinstance of that nucleotide. For example, the nucleotide string ATTTAwould be converted to the nucleotide string ATA before furtherprocessing.

If the length of the homopolymers can be controlled at leastapproximately, then the length of the homopolymer generated by thetemplate independent polymerase may be converted to a shorterhomopolymer with a length that depends on the length of the homopolymerin the original nucleotide string. For example, if the enzyme isregulated so that each cycle of synthesis adds approximately fournucleotides (i.e., an average extension length of four nucleotides) ahomopolymer of length eight would be interpreted as two occurrences ofthe specified nucleotide. An example of this is collapsing thenucleotide string GGGGGGGG to the shorter homopolymer GG.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter nor is it intended tobe used to limit the scope of the claimed subject matter. The term“techniques,” for instance, may refer to system(s) and/or method(s) aspermitted by the context described above and throughout the document.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is set forth with reference to the accompanyingfigures. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears. Theuse of the same reference numbers in different figures indicates similaror identical items.

FIG. 1 shows an array during three different stages of enzymaticpolynucleotide synthesis using a mixture of unprotected nucleotides andnucleotides attached to a protecting group.

FIG. 2 is an architecture of a system for encoding digital informationin a polynucleotide synthesized on an array by a template independentpolymerase and for sequencing the polynucleotide to generate an outputstring that can be decoded to recover the digital information.

FIG. 3 is a flow diagram showing an illustrative process for array-basedenzymatic synthesis of polynucleotides.

FIG. 4 shows an example decoding technique that converts output stringswith homopolymers to collapsed sequence strings that have thehomopolymers replaced with a lesser number of nucleotides.

FIG. 5 is a flow diagram showing an illustrative process for using atemplate independent polymerase to synthesize a polynucleotide thatencodes digital information and recovering the digital information byanalyzing an output string generated from sequencing the polynucleotide.

FIG. 6 is an illustrative computer architecture for implementingtechniques of this disclosure.

DETAILED DESCRIPTION

This disclosure provides a method and device to overcome scaling issueswith current techniques for enzymatic polynucleotide synthesis bysynthesizing polynucleotides on a spatially addressable array. Thisdisclosure also provides a method and device for encoding digitalinformation in polynucleotides that are synthesized using an unregulatedenzymatic process which may add more than one nucleotide during a singlecycle of synthesis.

Polynucleotides, also referred to as oligonucleotides, include both DNA,RNA, and hybrids containing mixtures of DNA and RNA. DNA includesnucleotides with one of the four natural bases cytosine (C), guanine(G), adenine (A), or thymine (T) as well as unnatural bases,noncanonical bases, and/or modified bases. RNA includes nucleotides withone of the four natural bases cytosine, guanine, adenine, or uracil (U)as well as unnatural bases, noncanonical bases, and/or modified bases.Nucleotides include both deoxyribonucleotides and ribonucleotidescovalently linked to one or more phosphate groups.

Template independent polymerases are DNA or RNA polymerases that performde novo oligonucleotide synthesis without use of a template strand.Currently known template independent polymerases include TdT and tRNAnucleotidyltransferase. TdT includes both the full-length wild-typeenzyme, as well as modified enzymes that are truncated or internallymodified. One example of modified TdT is provided in U.S. Pat. No.10,059,929. An example of truncated TdT is provided in U.S. Pat. No.7,494,797. Thus, template independent polymerase as used herein includesfull-length wild-type, truncated, or otherwise modified TdT, tRNAnucleotidyltransferase, and any subsequently discovered or engineeredpolymerases that can perform template independent synthesis ofpolynucleotides. Template independent polymerase as used herein does notencompass modifications of TdT or tRNA nucleotidyltransferase thatrender those enzymes incapable of performing template independentnucleotide polymerization.

TdT is a protein that evolved to rapidly catalyze the linkage ofnaturally occurring deoxynucleotide triphosphates (dNTPs). TdT addsnucleotides indiscriminately to the 3′ hydroxyl group at the 3′ end ofsingle-stranded DNA. TdT performs unregulated synthesis adding anyavailable dNTP. TdT uses an existing single-stranded polynucleotidereferred to as an “initiator” as the starting point for synthesis.Initiators as short as three nucleotides have been successfully usedwith TdT for enzymatic synthesis of DNA. Suitable initiator lengthranges from three nucleotides to about 30 nucleotides or longer. Duringpolymerization, the template independent polymerase holds asingle-stranded DNA strand (which initially is only the initiator butgrows as synthesis proceeds) and adds dNTPs in a 5′-3′ direction. TdTactivity is maximized at approximately 37° C. and performs enzymaticreactions in an aqueous environment.

Because TdT performs unregulated synthesis, using this enzyme to createa polynucleotide with a pre-specified arbitrary sequence requiresregulation and control of the TdT activity. One technique to regulateTdT activity is limiting the available nucleotides to only a single typeof deoxynucleoside triphosphate (dNTP) or nucleoside triphosphate (NTP)(e.g., only deoxyadenosine triphosphate (dATP), deoxycytidinetriphosphate (dCTP), deoxyguanosine triphosphate (dGTP), deoxythymidinetriphosphate (dTTP), adenosine triphosphate (ATP), cytidine triphosphate(CTP), guanosine triphosphate (GTP), or uridine triphosphate (UTP)).Thus, providing only one choice forces the polymerase to add that typeof nucleotide.

However, this does not prevent TdT from adding that nucleotide multipletimes thereby creating homopolymers. Techniques for limiting homopolymercreation by TdT include using nucleotides with removable protectinggroups that prevent addition of more than one nucleotide at a time. SeeU.S. Pat. No. 10,059,929. Another technique to force single-nucleotideaddition is covalently coupling a single nucleotide to each TdT enzymeso that the TdT acts as its own protecting group preventing furtherchain elongation. See Sebastian Palluck et al., De novo DNA synthesisusing polymerase-nucleotide conjugates, 36(7) Nature Biotechnology 645(2018) and WO 2017/223517 A1. A third technique restricts homopolymerformation by limiting the available quantity of nucleotides throughcompetition for dNTPs between TdT and an enzyme that degrades dNTPs. SeeHenry H. Lee et al., Terminator free template-independent Enzymatic DNASynthesis for Digital Information Storage, 10(2383) Nat. Comm. (2019)and WO 2017/176541 A1.

Although techniques exist for limiting the “extension length” or averagenumber of nucleotides added during a cycle of synthesis, current methodsfor enzymatic nucleotide synthesis involve initiators attached to beadsin a test tube or other discrete reaction chamber. The reaction chamberis flooded with an aqueous solution containing TdT and only one type ofdNTP. Once coupling has taken place, the TdT and any free dNTPs arewashed away. The beads are incubated in a second step with TdT and adifferent dNTP. The process continues creating DNA molecules withsequence specified by the order in which the different dNTPs are added.Depending on the control technique used, TdT may add a single nucleotideor an uncontrolled number of the same nucleotide during each cyclesynthesis. This process does not scale well for applications thatrequire high throughput synthesis of multiple polynucleotides withdifferent sequences.

One relatively recent application for synthetic polynucleotides thatbenefits from high throughput synthesis is data storage. Polynucleotidessuch as DNA may be used to store digital information by designing asequence of nucleotide bases—adenine (A), cytosine (C), guanine (G), andthymine (T)—that encodes the zeros and ones of digital information.Advantages of using DNA rather than another storage media for storingbinary data include information density and longevity. The sequence ofnucleotide bases is designed on a computer and then DNA molecules withthat sequence are generated by an oligonucleotide synthesizer. The DNAmay be stored and later read by polynucleotide sequencer to retrieve thebinary data.

There are various techniques and encoding schemes known to those ofskill in the art for using nucleotide bases to represent binary data.See Lee Organick et al., Random Access in Large-Scale DNA Data Storage,36:3 Nat. Biotech. 243 (2018), and Henry H. Lee et al., Terminator freetemplate-independent Enzymatic DNA Synthesis for Digital InformationStorage, 10(2383) Nat. Comm. (2019), WO 2018/148260 A1 and U.S. Pat.App. Pub. No. 2017/0141793. However, these encoding techniques eitherrequire enzymatic synthesis techniques that constrain the extensionlength to a single nucleotide or accommodate homopolymers in theencoding by sacrificing information density such as by encoding data intrits rather than using an alphabet with four (or more) letters.

FIG. 1 shows an illustrative representation of an array 100 forenzymatic polynucleotide synthesis at three different stages of thesynthesis cycle. The array 100 provides a solid support for solid-phasesynthesis of polynucleotides. Solid-phase synthesis is a method in whichmolecules are anchored to a solid support material and synthesized whileattached to the solid support.

The array 100 may be formed from a silicon chip, glass (e.g., controlledporous glass (CPG)), an insoluble polymer, or other material. The array100 being a generally flat two-dimensional surface provides foraddressable, site-specific manipulations at specified locations (e.g.,represented in terms of x- and y-coordinates) on the surface of thearray 100. The array 100 may be an electrochemically inert surface or itmay include an array of spatially addressable microelectrodes. Oneexample of a suitable array with microelectrodes is provided in U.S.patent application Ser. No. 16/435,363 filed on Jun. 7, 2019, with thetitle “Reversing Bias in Polymer Synthesis Electrode Array.”

The array 100 may be covered with a plurality of spots 102(A), 102(B), .. . , 102(N) at which initiators 104 are attached. Although only threespots 102(A), 102(B), 102(N) are shown in this illustrativerepresentation many thousands or hundreds of thousands of spots may bepresent on a typical array 100. The size of a single spot 102 can besmaller than about 1 cm², smaller than 1 mm², smaller than 0.5 mm², andin some implementations about 0.125 to 0.5 mm².

The initiators 104 may be attached to the array 100 using any knowntechnique for anchoring single-stranded DNA or RNA to a solid supportsuch as techniques used in conventional solid-phase synthesis ofoligonucleotides or used for creation of DNA microarrays. For example,the initiators 104 may be spotted onto the array 100 by use of a robotto “print” pre-designed nucleotide sequences using fine-pointed pins,needles, or ink-jet printing onto a chemical matrix surface usingsurface engineering. Other methods employ photo-activated chemistry andmasking to synthesize the initiators 104 one nucleotide at a time on thesolid surface of the array 100 with a series of repeated steps to buildup the initiators 104 at designated locations.

The initiator 104 is a single-stranded polynucleotide chain. The lengthof the initiator 104 may be about 3-30 nucleotides, about 15-25nucleotides, or about 20 nucleotides. The initiator 104 is not shown toscale. Enzymatic synthesis begins at a 3′ terminal nucleotide on the endof the initiator 104 and proceeds by adding one or more nucleotides tothe initiator 104. The most recently added nucleotide becomes the 3′terminal nucleotide and the synthesis cycle can proceed again.

All of the initiators 104 attached to the array 100 may have the same orapproximately the same nucleotide sequence or one or more of theinitiators 104 may have different sequences from the others. Thesequence of any one or more of the initiators 104 may be a randomsequence of nucleotides. The initiators 104 may also be constructed withnon-random sequences such as, for example, sequences that are cleaved bya restriction endonuclease. Cleavage of the initiators 104 is one way torelease completed polynucleotides 106 from the surface of the array 100.The sequences of the initiators 104 may also be designed or used asprimer binding sites for subsequent amplification (e.g.,polymerase-chain reaction (PCR) amplification) of fully synthesizedpolynucleotides.

Each spot 102 on the array 100 may contain many tens or hundreds ofinitiators 104 although for simplicity only three initiators 104 areshown on each spot 102 in this illustrative representation. Eachinitiator 104 attached to the same spot 102 is subject to the samespatially addressable control. Stated differently, any spatiallyaddressable manipulations applied to the array 100 performed at theresolution of individual spots 102. However, the polynucleotides 106synthesized on the same spot 102 do not necessarily have the same singlenucleotide sequence because of the formation of variable lengthhomopolymers. In this illustrative representation, a first cycle ofsynthesis has added from one to three adenine nucleotides (A) to the 3′end of the initiators 104.

At the start of synthesis, each of the 3′ nucleotides on the ends of theinitiators 104 may be capped with a protecting group 108 represented inFIG. 1 as small stars. Each cycle of synthesis may end when all orsubstantially all of the polynucleotides 106 attached to the array 100are capped with a protecting group 108. The protecting group 108prevents further addition of nucleotides, and thus, is one way toregulate synthesis of the polynucleotides 106.

The protecting groups 108 may attach to the 3′ hydroxyl group or toanother location on a dNTP, NTP, or a nucleotide attached to theinitiator 104. The protecting groups 108 may be any kind of moiety orgroup that prevents a polymerase from adding additional nucleotides. Asis known to those skilled in the art, there are various techniques forremoving protecting groups 108 based on the specific composition of theprotecting group 108 and the reaction environment. For example, aprotecting group 108 may be removed by addition of chemicals (e.g., anacid or base solution), may be photolabile and cleaved by exposure tolight, may be thermolabile and cleaved by exposure to heat, or may becleaved by an enzyme.

Some examples of protecting groups 108 include esters, ethers,carbonitriles, phosphates, carbonates, carbamates, hydroxylamine,borates, nitrates, sugars, phosphoramide, phosphoramidates,phenylsulfenates, sulfates, sulfones, and amino acids. See Michael L.Metzker et al., Termination of DNA Synthesis by Novel 3‘-modified-deoxyribonucleoside 5’-triphosphates, 22(20) Nucl. AcidsRes., 4259 (1994) and U.S. Pat. Nos. 5,763,594, 6,232,465, 7,414,116,and 7,279,563. Other types of protecting groups 108 include 3′-O-amino,3′-O-allyl, and a 3′-O-azidomethyl groups. Examples of protecting groups108 also include O-phenoxyacetyl; O-methoxyacetyl; O-acetyl;O-(p-toluene)-sulfonate; O-phosphate; O-nitrate; O-[4-methoxy]-tetrahydrothiopyranyl; O-tetrahydrothiopyranyl;O-[5-methyl]-tetra-hydrofuranyl;O-[2-methyl,4-methoxy]-tetrahydropyranyl; O-[5-methyl]-tetrahydropyranyl; and O-tetrahydrothiofuranyl. See U.S. Pat. No. 8,133,669 fora discussion of these protecting groups. Additional examples ofprotecting groups are provided in U.S. patent application Ser. No.16/230,787 filed on Dec. 21, 2018, with the title “SelectivelyControllable Cleavable Linkers.”

Selectively addressable deblocking of the protecting groups 108 that arecapping the polynucleotides 106 on one or more of the spots 102, in thisexample spot 102(A), regulates where on the surface of the array 100nucleotides may be added during the next synthesis cycle. In a synthesiscycle, the surface of the array 100 may be flooded with a selectednucleotide 110 (i.e., a nucleotide having a specified base such as A, C,G, T, or U) and a polymerase 112. Controlling which spots 102 on thearray 100 have the protecting groups 108 removed thereby defines aselected location 114 for addition of the next type of nucleotide.

The selected location 114 may be any one or more locations that arecontiguous or separate on the surface of the array 100. The selectedlocation 114 may be a single spot 102, a group of spots 102 locatedadjacent to each other, or multiple disparate spots 102 spread acrossthe surface of the array 100. In some implementations, the selectedlocation 114 has an area less than about 10,000 μm² or less than 100μm². The resolution or minimum size of the selected location 114 may bea single spot 102.

The polymerase 112 is a template independent polymerase such as TdT ortRNA nucleotidyltransferase. The template independent polymerase 112 maybe obtained from a number of sources such as isolation from calf thymusor from a recombinant source (e.g., a genetically modified E. colistrain). The template independent polymerase 112, selected nucleotide110, and other entities that are not attached to the array 100 arepresent in an aqueous solution (not shown) that covers the surface ofthe array 100. The aqueous solution may include buffers, salts,electrolytes, and the like. For example, the aqueous solution mayinclude TdT buffer and a CoCl₂ solution,

The selected nucleotide 110 in this illustrative representation includesthe base guanine (G) such as dGTP or GTP. The selected nucleotide 110may be provided in a nucleotide mixture that that includes bothprotected nucleotides 110(A) that are attached to a protecting group 108and unprotected nucleotides 110(B) that are not attached to a protectinggroup 108. The selected nucleotide 110 including both the protectednucleotides 110(A) and that unprotected nucleotides 110(B) may beprovided in excess so that the availability of the selected nucleotide110 is not a limiting factor. So long as the selected nucleotide 110 isavailable and reaction conditions permit the polymerase 112 to function,the template independent polymerase 112 will continually incorporateunprotected nucleotides 110(B) until a protected nucleotide 110(A) isincorporated.

Thus, the nucleotide ratio 116 of protected nucleotides 110(A) tounprotected nucleotides 110(B) may be tuned to adjust the extensionlength. In this illustrative representation the extension length, thenumber of “G” added, is one, two, or three. In actual synthesis, theextension length could be much longer and include a greater range. Thus,the extension length for the polynucleotides 106 at spot A 102(A) is avariable number of nucleotides 118 with an average value of two.

This variation exists because the selection of a protected nucleotide110(A) or an unprotected nucleotide 110(B) to incorporate at the end ofa growing polynucleotide 106 is essentially random based on diffusion ofthe nucleotides throughout the aqueous solution covering the array 100.Thus, under a given set of reaction conditions the extension length fora population of polynucleotides 106 will be a variable number ofnucleotides 118 with a distribution concentrated around a mean extensionlength. The reaction conditions include temperature, time, and theconcentrations of the protected nucleotide 110(A), of the unprotectednucleotide 110(B), and of the template independent polymerase 112.

Thus, unless context indicates otherwise, “extension length” refers tothe average extension length for a given set of reaction conditions.This variation in extension length for individual ones of thepolynucleotides 106 is the reason why a population of polynucleotides106 synthesized under the same reaction conditions include homopolymerswith a variable number of nucleotides 118. This is different from ahomopolymer with a fixed number of nucleotides that is the samethroughout a population of polynucleotides.

Adjusting the nucleotide ratio 116 of the protected nucleotides 110(A)to unprotected nucleotides 110(B) is a way to tune or adjust theextension length. As the relative concentration of protected nucleotides110(A) increases, the extension length decreases. Conversely, as therelative concentration of unprotected nucleotides 110(B) increases, theextension length increases. One example nucleotide ratio 116 is oneprotected nucleotide 110(A) for every 200 or more unprotectednucleotides 110(B).

Using a mixture that includes unprotected nucleotides 110(B) rather thanonly protected nucleotides 110(A) provides cost benefits becauseunprotected nucleotides 110(B) are less expensive. For example, dNTPswith protecting groups such as CleanAmp® dNTPs available from TriLink®Biotechnologies cost approximately 2.5 times more than equivalentunprotected dNTPs.

Spatially addressable deblocking of the protecting groups 108 is one ofmany possible techniques for controlling where on the array 100polynucleotide synthesis is able to occur. In implementations whereprotecting groups 108 are not used, the selected location 114 may bedefined by areas on the array 100 where conditions are changed such thatthe template independent polymerase 112 is active. Localized templateindependent polymerase 112 activation may be achieved by regulating theoxidation state of a metal cofactor that catalyzes activity of thepolymerase 112. U.S. patent application Ser. No. 16/543,433 filed onAug. 16, 2019, with the title “Regulation of Polymerase Using CofactorOxidation States” describes the use of cofactor oxidation states tocontrol template independent polymerase activity.

Other techniques for controlling the activity of the templateindependent polymerase 112 include use of an inkjet printer ormicroelectrode array to selectively make available a reagent or induce achange that is necessary for template independent polymerase 112activity. For example, spatially addressable addition of the selectednucleotide 110 or a metal cofactor can be used to define a selectedlocation 114. Cleavage of a protecting group attached to the templateindependent polymerase 112 only at the selected location 114 may be usedto regulate the locations of template independent polymerase 112activity.

FIG. 2 shows an illustrative architecture of a system 200 forimplementing aspects of this disclosure. In the system 200 digitalinformation 202 representing, for example, the data from a computer fileis provided to a computing device 204. The computing device 204 may beimplemented as any type of conventional computing device such as adesktop computer, a laptop computer, a server, a hand-held device, orthe like. The computing device 204 may be a standalone device or may beintegrated with another device present in the system 200.

The computing device 204 includes an oligonucleotide synthesizer controlmodule 206. The oligonucleotide synthesizer control module 206 providesinstructions that can control operation of an oligonucleotidesynthesizer 208. The instructions may communicate to the oligonucleotidesynthesizer 208 a nucleotide sequence string 210 for synthesis. Thenucleotide sequence string 210 is an electronic representation ofspecific nucleotides bases. In this illustrative system 200, thenucleotide sequence string 210 begins with the deoxyribonucleotidesAGTAGGCGT.

The nucleotide sequence string 210 is generated by an encoding module212 in the computing device 204. The encoding module 212 converts thedigital information 202 into a nucleotide sequence string 210 accordingto an encoding scheme. The encoding scheme may include error correction,redundancy, and addition of metadata such as nucleotide sequences thatfunction as tags for random access. Encoding schemes for representingdigital information 202 as a nucleotide sequence string 210 are known tothose of skill in the art. In an implementation, the encoding module 212generates a nucleotide sequence string 210 that does not includehomopolymers. Thus, every position in the string is followed by adifferent value. However, in a different implementation, the encodingmodule 212 generates a nucleotide sequence string 210 that includeshomopolymers.

The oligonucleotide synthesizer 208 is a device that performs automatedsolid-phase synthesis of polynucleotides on an array 100. The array 100may be located within a reaction chamber 209 or container capable ofmaintaining an aqueous environment in contact with the surface of thearray 100. The oligonucleotide synthesizer 208 may also include a heaterto control the temperature of the aqueous solution in the reactionchamber 209.

During each synthesis cycle, the oligonucleotide synthesizer 208 maydeliver a reaction reagent solution 214 followed by a wash solution 216to the reaction chamber 209 containing the array 100. The reactionreagent solution 214 and the wash solution 216 may be delivered to thearray 100 through fluid delivery pathways. The fluid delivery pathwaysmay be implemented by tubes and pumps, microfluidics, laboratoryrobotics, or other equipment and techniques.

The reaction reagent solution 214 is an aqueous solution that containsthe template independent polymerase 112, the selected nucleotide 110,and a buffer or salt. The selected nucleotide 110 may be added to thereaction reagent solution 214 from one of several nucleotide mixtures218. Each of the nucleotide mixtures 218 includes only one type ofnucleotide (e.g., dATP, dGTP, dCTP, or dTTP) as a mixture of protectednucleotides 110(A) and unprotected nucleotides 110(B). Each of thenucleotide mixtures 218 may include a specific nucleotide ratio 116. Thespecific nucleotide ratio 116 may be different for different types ofnucleotides due to variations in polymerization kinetics depending onthe base attached to the nucleotide sugar. The base-specific nucleotideratios 116 may be selected so that the extension length is the same foreach type of nucleotide. Protected nucleotides 110(A) that are not mixedwith unprotected nucleotides 110(B) may also be available to theoligonucleotide synthesizer 208. The reaction reagent solution 214 maybe created with only protected nucleotides 110(A) to synthesizesequences with precise base-by-base level control of the synthesizedsequence.

The buffer may be any one of a number of aqueous buffers that arecompatible with the template independent polymerase 112 such as, forexample, phosphate-buffered saline (PBS). PBS is a water-based saltsolution containing disodium hydrogen phosphate, sodium chloride and, insome formulations, may also include one or more of potassium chlorideand potassium dihydrogen phosphate. The buffer may be an aqueoussolution including 1 M potassium cacodylate, 125 mM Tris-HCl, 5 mM CoCl,and 1.25 mg/ml BSA, at pH 6.6. Other examples of aqueous buffers knownto those of ordinary skill in the art include HEPES, MOPS, PBS, PBST,TAE, TBE, TBST, TE, and TEN. See Vincent S. Stoll & John S. Blanchard,Buffers: Principles and Practice, 182 Meth. Enzoml., 24 (1990).

The wash solution 216 may be added to the array 100 as a step of thepolynucleotide synthesis process. The wash solution 216 is water (e.g.,DI (deionized) water) or an aqueous solution that contains at least oneof a salt or a buffer. The salt or the buffer may be the same as thesalt or buffer used in the reaction reagent solution 214. The washsolution 216 removes the reaction reagent solution 214 from the surfaceof the array 100 which remove any remaining free nucleotides andprevents contamination between separate cycles of synthesis.

In implementations that do not use protecting groups 108, a techniqueother than capping polynucleotide strands with a protecting group 108 isused to stop the activity of the template independent polymerase 112.Other techniques for stopping the template independent polymerase 112from continually adding nucleotides include removing a necessarycomponent of the polymerization reaction from the reaction chamber 209and inactivating a necessary component of the polymerization reaction.

Displacing the reaction reagent solution 214 with the wash solution 216removes both the template independent polymerase 112 the selectednucleotide 110. Thus, adding the wash solution 216 is one way to stopthe activity of the template independent polymerase 112. Other ways tostop the activity of the template independent polymerase 112 includeraising the temperature of the aqueous solution in the reaction chamber209 to a temperature that inactivates the template independentpolymerase 112. For example, the template independent polymerase 112 maybe inactivated at temperatures above about 70° C., above about 65° C.,or above about 60° C. Addition of a chelator such as EDTA thatcoordinates with metal cofactors necessary to catalyze enzymaticactivity is another technique that may be used to stop the activity ofthe template independent polymerase 112. Denaturing the templateindependent polymerase 112 with a surfactant such as sodium dodecylsulfate (SDS) is yet another way to inactivate the template independentpolymerase 112.

Enzymatic polynucleotide synthesis without the use of protecting groups108 achieves spatially addressable synthesis through use of a spatiallyaddressable control 220. The spatially addressable control 220 is usedto designate the selected location 114 during each cycle of synthesis.The spatially addressable control 220 may be implemented as an array ofspatially addressable microelectrodes embedded in or beneath the array100. The microelectrodes may be implemented with any known technologyfor creating microelectrodes such as complementarymetal-oxide-semiconductor (CMOS) technology. CMOS may includemetal-oxide-semiconductor field-effect transistors (MOSFETs) madethrough a triple-well process or by a silicon-on-insulator (SOI)process. A series of controllable gates/transistors implemented withCMOS circuits can be controlled to inject charge at any location on thesurface of the array 100.

Each spatially addressable microelectrode in the array 100 may beindependently addressed allowing the creation of arbitrary and variablevoltage microenvironments across the surface of the array 100. Changesin the voltage microenvironments can promote or inhibit polynucleotidesynthesis depending identity of species in the reaction reagent solution214 such as metal cofactors. Thus, a microelectrode array can controlwhere on the surface of the array 100 polynucleotide synthesis occurs.

The spatially addressable control 220 may be an inkjet printer that isable to precisely apply small volumes of reagents to specific locationson the surface of the array 100. Techniques for using inkjet printing toprecisely deliver chemical reagents to selected locations on a surfaceof an array are well-known to those of ordinary skill in the art. Thechemical reagent delivered by the inkjet printer may be used to promoteor inhibit polynucleotide synthesis. Thus, an inkjet printer can controlwhere on the surface of the array 100 polynucleotide synthesis occurs.

The spatially addressable control 220 may be a light array that iscapable of directing light to specific locations on the surface of thearray 100. Light from the light array may excite a photocatalyst thatperforms a photoredox reaction or that cleaves a photolabile linker. Thelight array may include a photomask or digital micromirror device todirect the light. Thus, a light array can control where on the surfaceof the array 100 polynucleotide synthesis occurs.

The spatially addressable control 220 may use a technique to physicallyblock the template independent polymerase 112 or the selected nucleotide110 from accessing areas of the array that are not the selected location114. Doing so limits polymerization to the selected location 114 on thearray 100. In one implementation, the template independent polymerase112 and/or the selected nucleotide 110 are blocked from some regions ofthe array 100 by targeted and precise addition of the wash solution 216(or another solution that does not contain the template independentpolymerase 112 and/or nucleotides). The wash solution 216 may displaceor dilute the template independent polymerase 112 and the selectednucleotide 110 where added thereby preventing polynucleotide synthesis.Microfluidics embedded in the array 100 may include pores or outletscollocated with the spots 102. Use of the microfluidics to deliver thewash solution 216 to one or more spots 102 during the incubationprevents polymerization on the initiators at those spots 102.

In one implementation, blocking access to regions of the array may beachieved by depositing or creating gas bubbles at the locations of oneor more of the spots 102. The gas bubbles occupy the surface of thearray 100 and remain in position due to surface tension of the reactionreagent solution 214. The array 100 may have a well or depression at thelocation a spot 102 which can stabilize the position of the gas bubble.Presence of a gas bubble displaces the reaction reagent solution 214thereby preventing the template independent polymerase 112 and theselected nucleotide 110 from accessing initiators 104 covered by orcontained within a gas bubble.

A gas bubble may be deposited on the surface of the array 100 at aspecified location, such as at a given spot 102, by directing air,nitrogen, oxygen, carbon dioxide, or another gas through a smalldiameter tube. The location of the tube relative to the surface of thearray 100 may be controlled by laboratory robotics or similar systemcapable of precision movements relative to two axes (e.g., x- and y-axisof the array 100). Hydrogen gas bubble may be created on the surface ofa microelectrode array by activating one or more microelectrodes tohydrolyze water in the reaction reagent solution 214.

The spatially addressable control 220, no matter how implemented, may beoperated by control circuitry in the oligonucleotide synthesizer 208.The control circuitry may be implemented as any type of circuitrysuitable for controlling hardware devices such as a printed circuitboard, microcontroller, a programmable logic controller (PLC), or thelike. The control circuitry may receive instructions from theoligonucleotide synthesizer control module 206. The instructions mayindicate the regions of the array 100 at which polynucleotide synthesiswill occur. The control circuitry then causes the spatially addressablecontrol 220 to remove protecting groups 108 or enable the activity ofthe template independent polymerase 112 at the selected location 114.

Ultimately the oligonucleotide synthesizer 208 creates a polynucleotide106. A different polynucleotide sequence may be synthesized on each ofthe spots 102 on the array 100. The sequence of the polynucleotide 106is specified by the nucleotide sequence string 210 received from theencoding module 212. However, due to the unregulated polymerization bythe template independent polymerase 112, the polynucleotide 106 includeshomopolymers that are not present in the nucleotide sequence string 210.

The polynucleotide 106 may be cleaved from the array 100 by severing aconnection between the initiator 104 and the array 100 or by cleavingthe initiator 104. The polynucleotide 106 free of the array 100 may thenbe stored in solution such as the wash solution 216, as a dried pelletin a tube, or any other technique for storage of polynucleotides 106.During storage, the polynucleotides 106 may serve to provided long-term“cold” storage for the digital information 202. The polynucleotide 106may also be processed using any existing technique for processingpolynucleotides including, but not limited to, PCR amplification of thepolynucleotide 106 to increase the copy number. As is known to those ofordinary skill in the art, PCR amplification uses a forward and reverseprimer to initiate polymerization of a template strand. Primer bindingspecificity is one factor in controlling the efficacy of PCRamplification. Thus, the sequence of the polynucleotide 106 that servesas a primer binding site may have less tolerance for incorporation ofhomopolymers than other regions of the polynucleotide 106. Accordingly,the regions of the polynucleotide 106 that may potentially function asprimer binding sites can be created using a technique other thanenzymatic synthesis or created with a reaction solution that containsonly protected nucleotides 110(B) in order to achieve base-by baseprecision in the nucleotide sequence.

When it is time to recover the digital information 202, thepolynucleotide 106 may be sequenced by a polynucleotide sequencer 222.Polynucleotide sequencers are well known to those of ordinary skill inthe art. Many different techniques may be used to read the sequence ofnucleotide bases in the polynucleotide 106. Common sequencing techniquesinclude dideoxy sequencing reactions, NextGen sequencing, and nanoporesequencing. Classic dideoxy sequencing reactions (Sanger method) uselabeled terminators or primers and gel separation in slab or capillaryelectrophoresis.

NextGen sequencing refers to any of a number of post-classic Sanger typesequencing methods which are capable of high throughput, multiplexsequencing of large numbers of samples simultaneously. Current NextGensequencing platforms are capable of generating reads from multipledistinct nucleic acids in the same sequencing run.

Nanopore sequencing uses a small hole, a “nanopore,” on the order of 1nanometer in diameter. Immersion of a nanopore in a conducting fluid andapplication of a potential across it results in a slight electricalcurrent due to conduction of ions through the nanopore. The amount ofcurrent which flows is sensitive to the size of the nanopore. As a DNAmolecule passes through a nanopore, each nucleotide on the DNA moleculeobstructs the nanopore to a different degree. Thus, the change in thecurrent passing through the nanopore as the DNA molecule passes throughthe nanopore represents a reading of the DNA sequence.

The polynucleotide sequencer 222 provides an output string 224 in anelectronic format that can be manipulated by a computer such as, but notlimited to, the computing device 204 to decode and recover the digitalinformation 202. However, as described above, the output string 224 mayinclude a homopolymer 226 (likely many homopolymers) that are notpresent in the nucleotide sequence string 210. The differences betweenthe output string 224 and the nucleotide sequence string 210 make itdifficult or impossible for many encoding schemes to recover the digitalinformation 202 from polynucleotides 106 created by enzymatic synthesis.

FIG. 3 shows process 300 for synthesizing polynucleotides on an arrayusing a template independent polymerase. This process 300 may beimplemented, for example, using any of the reactions, structures, anddevices shown in FIGS. 1 and 2 .

At 302, an array is prepared by the addition of one or more initiators.This results in the creation of an array that is covered with aplurality of initiators. The initiators are single-stranded nucleotideswith a length of between about 3-30 nucleotides. A template independentpolymerase uses the initiators as a starting point for polynucleotidesynthesis by adding additional nucleotides to the 3′ terminal nucleotideat the end of each initiator. The array is rigid or semi-rigid and maybe made out of silicon dioxide, glass, an insoluble polymer, or othermaterial. The array has at least one substantially flat surface. Theinitiators may be attached to the array using any known technique foranchoring single-stranded DNA or RNA to a solid support such astechniques used in conventional solid-phase synthesis ofoligonucleotides or used for creation of DNA microarrays.

Each of the initiators may be identical having the same length andnucleotide sequence. However, there may also be variation among theinitiators in terms of length as well as sequence. In someimplementations, the sequences of the initiators may include a cut sitefor restriction enzymes or other nucleases to cleave synthesizedpolynucleotides from the surface of the array. In some implementation,the initiators may serve as primer binding sites for subsequentamplification of the synthesized polynucleotides.

At 304, the array is incubated with a reaction reagent solution. Thereaction reagent solution may be delivered to a reaction chamber thatcontains the array. The reaction reagent solution may be added to thereaction chamber by a manual technique such as pipetting. The reactionreagent solution may be added to the reaction chamber by an automated ormechanized system such as via a fluid delivery pathway. The reactionreagent solution includes a substrate independent polymerase such as TdTand a selected nucleotide.

In one implementation, the selected nucleotide is provided as anucleotide mixture. The nucleotide mixture includes the selectednucleotide attached to a protecting group and unprotected forms of theselected nucleotide. The nucleotide mixture may be one of the nucleotidemixtures 218 shown in FIG. 2 . In one implementation, only unprotectednucleotides are included in the reaction reagent solution. For example,the selected nucleotide may be one of deoxyadenosine triphosphate(dATP), deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate(dGTP), deoxythymidine triphosphate (dTTP), adenosine triphosphate(ATP), cytidine triphosphate (CTP), guanosine triphosphate (GTP), oruridine triphosphate (UTP).

Incubation continues for a length of time referred to as a reactiontime. The reaction time may be any length of time sufficient forpolymerization to occur. If activity of the template independentpolymerase is not stopped, such as by addition of nucleotides withprotecting groups, increased reaction time increases the extensionlength. For example, the reaction time may be 10, 20, 30, 40, 50seconds, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60minutes, or longer.

Incubation is performed at a reaction temperature. The reactiontemperature may be maintained by a heater or heating element in theoligonucleotide synthesizer. The reaction temperature may be atemperature at which the template independent polymerase is active. Thereaction temperature may be different for different polymerases.Further, changes in the reaction temperature may affect the extensionlength. Within the range of acceptable temperatures for a givenpolymerase, increases in reaction temperature generally increase theextension length. The reaction temperature may be, for example, between20 and 40° C., such as 25, 30, or 37° C.

During the incubation, the template independent polymerase adds avariable number of nucleotides to the 3′ end of the initiators. Thevariable number of nucleotides is the extension length obtained under agiven set of reaction conditions. A target value for the variable numberof nucleotides may be predetermined prior to the incubation at 304. Forexample, the target value for the variable number of nucleotides may bespecified as part of an encoding scheme implemented by the encodingmodule 212 and the reaction conditions may be selected to achieve anaverage extension length that is the target value.

Reaction conditions that affect the variable number of nucleotidesinclude the reaction time, the reaction temperature, the concentrationof the template independent polymerase, and the concentration of freenucleotides. The density or number of initiators, specifically a numberof available 3′ ends, also affects the variable number of nucleotidesbecause this controls the number of available reaction positions for agiven concentration of polymerase and nucleotides. The concentration ofthe template independent polymerase may be measured in terms of enzymeactivity which is represented in “units.” One unit is defined as theamount of enzyme catalyzing the incorporation of 1 nmol dTTP intoacid-insoluble material in a total reaction volume of 50 μl in 60minutes at 37° C. using d(A)₁₈ (i.e., a single-stranded 18-mer ofdeoxyadenosine nucleoside) as an initiator. Persons of ordinary skill inthe art will be able to tune the variable number of nucleotides based onadjusting these reaction conditions.

If the nucleotide mixture includes protected nucleotides, the ratio ofunprotected nucleotides to protected nucleotides also affects thevariable number of nucleotides. The variable number increases as arelative ratio of protected nucleotides decreases. For example, with areaction time of one minute and a reaction temperature of 37° C. using20 units of TDT for each 10 pmol of initiators, the variable number ofnucleotides is about five with a 1:1 ratio of 0.25 mmol of dNTPs withprotecting groups to 0.25 mmol of unprotected dNTPs.

If the selected nucleotide is provided only in unprotected form, then aratio of the template independent polymerase (as measured in units) tothe number of available 3′ ends on the initiators, to the selectednucleotide affects the variable number of nucleotides. Holding otherfactors constant, the following examples show how adjusting the reactiontime can affect the variable number of nucleotides.

At a reaction temperature of 37° C. using 20 units of TDT for each 10pmol of initiators and 0.5 mmol of the selected nucleotide (withoutprotecting groups), the variable number of nucleotides is about 10 for areaction time of one minute. Under the same conditions, the variablenumber of nucleotides increases to about 40 for a reaction time of fiveminutes. As a further example, with a reaction time of 10 seconds and areaction temperature of 25° C. using units of TdT for each 10 pmol ofinitiators and 0.5 mmol of the selected nucleotide (without protectinggroups), the variable number of nucleotides is about five.

At 306, process 300 branches depending on if protecting groups arepresent on at least some of the nucleotides incubated with the array.Process 300 proceeds along the “yes” path to 308 if protecting groupssuch as the protecting group 108 shown in FIG. 1 are present on thenucleotides. If not, process 300 proceeds along the “no” path to 316.

At 308, activity of the template independent polymerase is stopped atthe end of the reaction time. The length of time until activity of thetemplate independent polymerase is stopped may define the reaction time.Some ways to stop activity of the template independent polymeraseinclude changing the oxidation state of a metal cofactor to an oxidationstate other than +2 or removing access to the metal cofactor bychelation.

Other ways to stop activity of the template independent polymeraseinclude inactivating the template independent polymerase through heatingor denaturation with a surfactant. Preventing the nucleotides frompolymerizing by cleaving the phosphate group can also preventpolymerization by denying the template independent polymerase reactivemonomers. Another way of stopping activity of the template independentpolymerase is to remove the template independent polymerase or the freenucleotides from contact with the surface of the array.

At 310, a wash solution is delivered to the array to remove the reactionreagent solution. The wash solution may be flowed across the entirereaction site displacing the reaction reagent solution and therebystopping polymerization. Thus, in some implementations, operations 308and 310 are the same because delivering the wash solution stops theactivity of the template independent polymerase. The wash solution iswater without added salts or an aqueous solution that contains at leastone of a salt or a buffer. The buffer may be any one of a number ofaqueous buffers that are compatible with polymerases and single-strandednucleotides such as PBS or tris-buffered saline (TBS).

The wash solution also clears any remaining nucleotides from theprevious cycle of synthesis from the reaction chamber and the surface ofthe array. This prevents incorporation of an incorrect nucleotide duringa subsequent cycle of synthesis.

At 312, the protecting groups are removed from a selected location onthe array. The protecting groups may be any kind of moiety or chemicalgroup incorporated into the protected nucleotides. Some examples ofprotecting groups are esters, ethers, carbonitriles, phosphates,carbonates, carbamates, hydroxylamine, borates, nitrates, sugars,phosphoramides, phosphoramidates, phenylsulfenates, sulfates, sulfones,and amino acids.

The selected location on the array may be one or more spots that eachcontain multiple individual polynucleotides such as the spots 102illustrated in FIG. 1 . The selected location may be selected based onwhich polynucleotides are identified as receiving the next freenucleotide according to instructions such as instructions provided bythe oligonucleotide synthesizer control module 206. The selectedlocation may be changed one or more times during the synthesis ofpolynucleotides on the array.

The protecting groups are removed using a technique suitable for thetype of protecting group such as by light, heat, electrochemistry, orpH. The spatially addressable deblocking through removal of theprotecting groups may be implemented by use of a spatially addressablemicroelectrode array, a spatially addressable inkjet printing system, aspatially addressable light array, or another device.

At 314, it is determined if the polynucleotides are formed. If allnucleotides needed to create the specified sequence of thepolynucleotides have been added, then the polynucleotides are fullyformed. Process 300 may then proceed along the “yes” path and end orproceed to 324 and add a primer binding site to the 3′ end of thefully-formed polynucleotide.

If, however, the polynucleotides are not yet fully formed, process 300proceeds along “no” path and returns to 304 where the array is againincubated with a reaction reagent solution. The reaction reagentsolution in this iteration of synthesis may include a different selectednucleotide. The subsequent iteration may also add the selectednucleotide to a different set of polynucleotides on the array bychanging the selected location at which the protecting groups areremoved.

At 316, if protecting groups are not used, polymerization is activatedat a selected location on the array. Activating polymerization may beachieved by any technique that has a localized effect on the presence ofspecific components or reaction conditions that promote the activity ofthe template independent polymerase. In one implementation, theoxidation state of a metal cofactor of the template independentpolymerase may be changed to a +2 oxidation state at the selectedlocation thereby activating the polymerase. In one implementation,protecting groups attached to the template independent polymerase may beremoved at the selected location thereby activating the polymerase. Inone implementation, the selected nucleotide may be added only at theselected location.

Activating polymerization at the selected location on the array may alsobe achieved by globally activating polymerization across the array whileinhibiting polymerization at locations on the array other than theselected location. One technique for inhibiting polymerization isphysically blocking the template independent polymerase and/or theselected nucleotide from accessing locations on the array other than theselected location.

Access to locations on the array may be blocked by precise applicationof a fluid that does not contain the template independent polymerase orthe selected nucleotide to the surface of the array. This will displaceor dilute the template independent polymerase and/or the selectednucleotide at those locations thereby preventing polymerization. Anothertechnique for physically blocking the template independent polymeraseand/or the selected nucleotide is to deposit or create gas bubbles onthe surface of the array. The gas bubbles prevent the reaction reagentsolution containing the template independent polymerase and the selectednucleotide from contacting the array.

The selected location may be selected based on which polynucleotides areidentified as receiving the next free nucleotide according toinstructions such as instructions provided by the oligonucleotidesynthesizer control module 206. The selected location may be changed oneor more times during the synthesis of polynucleotides on the array.

At 318, activity of the template independent polymerase is stopped atthe end of the reaction time. The length of time until the activity ofthe template independent polymerase is stopped may define the reactiontime. Some ways to stop activity of the template independent polymeraseinclude changing the oxidation state of a metal cofactor and removingaccess to the metal cofactor such as by chelation.

Additional ways include inactivating the template independent polymerasethrough heating or denaturation with a surfactant. Preventing thenucleotides from polymerizing by cleaving the phosphate group can alsostop the activity of the template independent polymerase by denying itreactive monomers. Another way of stopping the activity of the templateindependent polymerase is to remove the template independent polymeraseor the free nucleotides from contact with the surface of the array.

At 320, a wash solution is delivered to the array to remove the reactionreagent solution. The wash solution may be flowed across the entirereaction site displacing the reaction reagent solution and therebystopping polymerization. Thus, in some implementations, operations 318and 320 are the same because delivering the wash solution stops theactivity of the template independent polymerase. The wash solution iswater without added salts or an aqueous solution that contains at leastone of a salt or a buffer. The buffer may be any one of a number ofaqueous buffers that are compatible with polymerases and single-strandednucleotides such as PBS or TBS.

The wash solution also clears any remaining nucleotides from theprevious cycle of synthesis from the reaction chamber and the surface ofthe array. This prevents incorporation of an incorrect nucleotide duringa subsequent cycle of synthesis.

At 322, it is determined if the polynucleotides are formed. If allnucleotides needed to create the specified sequence of thepolynucleotides have been added, then the polynucleotides are fullyformed. Process 300 may then proceed along the “yes” path and end orproceed to 324 and add a primer binding site to the 3′ end of thefully-formed polynucleotide. If, however, the polynucleotides are notyet fully formed, process 300 proceeds along “no” path and returns to304 where a reaction reagent solution is delivered to the reaction site.The reaction reagent solution in this subsequent iteration of synthesismay include a different selected nucleotide. A subsequent iteration mayadd nucleotides to a different set of polynucleotides on the array bychanging the selected location used at 316.

At 324, a primer sequence may be added to the 3′ ends of thefully-formed polynucleotides. The primer sequence may be added to the 3′end of a polynucleotide attached to an array by continuing enzymaticsynthesis using a nucleotide mixture that contains only protectednucleotides. This restricts addition per round of synthesis to only onenucleotide for all or most of the polynucleotides. This preciselycontrolled addition allows for base-by-base control in creation of aspecific sequence for the primer binding site.

Alternatively, the single-stranded nucleotide that will function as aprimer binding site may be synthesized elsewhere (e.g., using thephosphoramidite method) and ligated onto the 3′ end of thepolynucleotide while still attached to the array. Techniques forligation such as use of T4 DNA ligase are well-known to those ofordinary skill in the art.

Data Encoding with Enzymatically Synthesized Oligonucleotides

FIG. 4 illustrates encoding schemes for representing digital informationas a sequence of nucleotides created by a template independentpolymerase. The encoding scheme is tied to the behavior of templateindependent polymerases such as TdT because these polymerases add freenucleotides in an uncontrolled manner that creates homopolymers ofvariable length in the final polynucleotide. This variable length, alsoreferred to as extension length, is not known in advance and is notuniform. The average extension length can be controlled to some extentwith the techniques described above. However, even in a population ofpolynucleotides that are intended to have the same sequence (i.e.,polynucleotides attached to a same spot on the surface of an array)there are variations in the extension length.

Encoding schemes for representing digital information in a sequence ofnucleotides and for decoding and recovering digital information from apolynucleotide created with enzymatic synthesis do not depend on thetechnique used to synthesize the polynucleotide. That is to say,encoding schemes disclosed herein are equally applicable toenzymatically synthesized polynucleotides created on an addressablearray or on a different type of solid support such as beads in a testtube.

The encoding of digital information in a sequence of nucleotides may bedone according to an encoding scheme that excludes or that permitshomopolymers in a nucleotide sequence string. As used herein, the term“string” indicates a representation of the order of nucleotides in apolynucleotide and not an actual DNA or RNA molecule. A nucleotidesequence string, for example, may be represented as a series of letters(e.g., A, G, C, T) in an electronic file such as FASTA file. Examples ofencoding schemes for recording digital information in nucleotides thatcan be modified according to the teachings of this disclosure are knownto those of ordinary skill in the art.

The encoding scheme may be implemented by an encoding module in acomputing device such as encoding module 212 introduced in FIG. 2 . Ifthe encoding scheme excludes homopolymers, then the nucleotide sequencestring that is generated will not have any repeats of the samenucleotide. It homopolymers are permitted, the length of homopolymersmay be limited or bounded. For example, the encoding scheme may permitbounded homopolymers up to three repeats of the same nucleotide.Inclusion of homopolymers and increasing length of permissible boundedhomopolymers both increase information density. Thus, a larger number ofbits of digital information may be represented in the same length ofpolynucleotide if the encoding scheme accommodates homopolymers.

Regardless of whether the encoding scheme permits homopolymers,polynucleotides synthesized by a template independent polymerase includehomopolymers. Thus, an output string created by sequencing thesehomopolymers with a polynucleotide sequencer will include homopolymers.The output string, like the nucleotide sequence string, is arepresentation of nucleotides in a polynucleotide. The output stringreflects the order of individual nucleotide bases detected by apolynucleotide sequencer. The output string may be provided in anelectronic file such as a FASTQ file.

With an encoding scheme that does not use homopolymers, an output string400 (SEQ ID NO:1) is converted to a collapsed output string 402 (SEQ IDNO:2) by “collapsing” homopolymers in the output string 400 to singlenucleotides. Thus, the start of the output string 400 AAAA is collapsedto a single A in the collapsed output string 402. The length of ahomopolymer in the output string 400 does not affect the result. Everyhomopolymer regardless of length is collapsed to a single nucleotide.Portions of the output string 400 that do not include homopolymers arenot changed by this operation.

With an encoding scheme that permits homopolymers, the length of thehomopolymer is used to determine how to collapse an output string 404(SEQ ID NO:3). As described above, the extension length may becontrolled at least approximately by controlling and tuning reactionconditions. If, for example, the extension length is five nucleotides,then each single nucleotide in the nucleotide sequence string isrepresented by, approximately, five nucleotides in the output string404. Thus, a 2-nucleotide homopolymer in the nucleotide sequence stringwill result in a homopolymer of approximate length 10 in the outputstring 404.

The output string 404 in this example includes a 10-nucleotide longhomopolymer of Gs and a 15-nucleotide long homopolymer of Cs. Thesehomopolymers are collapsed, respectively to a 2-nucleotide homopolymerof G and a 3-nucleotide homopolymer of C. Thus, the collapsed outputstring 406 includes shorter homopolymers than the output string 404. Thelengths of the shorter homopolymers are based on the length of thehomopolymers in the output string 404.

The lengths of the homopolymers in the output string 404 are not alwaysprecise multiples of the extension length. Thus, the length ofhomopolymers in the output string 404 may be rounded down or up todetermining the length of homopolymer to include in the collapsed outputstring 406. The cutoff may be the midpoint between one extension lengthand the next. For example, if the extension length is five nucleotides,then any homopolymer of length 5-7 will be collapsed to a singlenucleotide and any homopolymer of length 8-12 will be collapsed to atwo-nucleotide homopolymer.

If the extension length is an even number, there may be homopolymers inthe output string 404 with lengths that are exactly at the midpointbetween one extension length and the next. For example, if the extensionlength is four, a homopolymer in the output string 404 of length six isequally likely to represent a single nucleotide or a two-nucleotidehomopolymer. In this case, the number of nucleotides to include in thecollapsed output string 406 may be decided by random selection.Alternatively, output strings 404 that include homopolymers of a lengththat is the midpoint between two integer multiples of the extensionlength may be discarded and not used for decoding the digitalinformation.

FIG. 5 shows process 500 for storing digital information onpolynucleotides synthesized with a template independent polymerase. Thisprocess 500 may be implemented, for example, using any of the reactions,structures, and devices shown in FIGS. 1 and 2 as well as the decodingscheme shown in FIG. 4 . Process 500 includes some operations that areimplemented on a computer through electronic processing and someoperations that implemented through physical processing of nucleotidesand polynucleotides.

At 502, digital information is received. The digital information may bea string of zeros and ones representing binary code for a computer file.For example, the digital information may be the same as the digitalinformation 202 shown in FIG. 2 .

At 504, the nucleotide sequence string is generated based on the digitalinformation using an encoding scheme. The encoding scheme represents thedigital information as a sequence of nucleotides. The encoding schememay exclude or permit homopolymers in the nucleotide sequence string.

At 506, a polynucleotide including the digital information issynthesized according to the nucleotide sequence string with a templateindependent polymerase. The polynucleotide may be synthesized by anoligonucleotide synthesizer such as the oligonucleotide synthesizer 208shown in FIG. 2 . However, other devices and techniques for synthesizingthe polynucleotide using a template independent polymerase are alsosuitable. Because the polynucleotide is synthesized with a templateindependent polymerase that adds multiple instances of the samenucleotide during a single synthesis cycle, the polynucleotide includesa homopolymer that is not present in the nucleotide sequence string.

At 508, the polynucleotide may be stored. The polynucleotide may bestored for a relatively short time in an aqueous solution such as thesolution in which it was synthesized. The polynucleotide may be storedfor a relatively longer period of time as a lyophilized pellet, encasedin a protective coating, dried onto filter paper, or by anothertechnique that preserves the structure of the polynucleotide. Thepolynucleotide, because it encodes digital information, represents aform of data storage.

At 510, polynucleotide including the digital information is sequenced.The polynucleotide may be sequenced with any type of sequencingtechnology such as the polynucleotide sequencer 222 shown in FIG. 2 .Sequencing the polynucleotide generates an output string whichrepresents the order of nucleotide bases as detected by thepolynucleotide sequencer.

At 512, the output string is received. The output string may bereceived, for example, by the encoding module in a computing device suchas encoding module 212 shown in FIG. 2 . The output string may bereceived as an electronic file containing a sequence of letters or othersymbols representing nucleotide bases. The output string, because it isa representation of the polynucleotide, also includes the homopolymerthat is not present in the nucleotide sequence string. Thus, the outputstring from the polynucleotide sequencer does not match the input stringprovided to the oligonucleotide synthesizer.

At 514, the output string is converted to a collapsed output string. Theoutput string is converted to the collapsed output string by replacingthe homopolymer in the output string with fewer nucleotides. If theencoding scheme used to generate the nucleotide sequence string excludeshomopolymers, then the fewer nucleotides is a single nucleotide. Thus,any homopolymer sequence in the output string is collapsed to a singlenucleotide. An example of converting an output string that is encodedwithout homopolymers is shown by the output string 400 and the collapsedoutput string 402 of FIG. 4 .

If the encoding scheme used to generate the nucleotide sequence stringallows homopolymers, then the fewer number of nucleotides is based on alength of the homopolymer in the output string. Thus, the fewer numberof nucleotides may be a single nucleotide or it may be a homopolymersuch as a string of the same nucleotide repeated two or three times. Thefewer number of nucleotides may be further based on a variable number ofnucleotides added by the template independent polymerase during a singlesynthesis cycle. As described above, this variable number of nucleotidesmay be adjusted at least approximately by controlling the reactionconditions of the synthesis. An example of converting an output stringthat is encoded with homopolymers is shown by the output string 404 andthe collapsed output string 406 of FIG. 4 .

At 516, the length of the homopolymer in the output string isidentified. The number of nucleotides in the output string that form anygiven homopolymer can be counted and recorded. This resulting number ofnucleotides in the converted output string may also be identified andthe change in the number of nucleotides for the homopolymer can thus beidentified and recorded. For example, collapsing a string of TTTT to Tmay be identified as a 4:1 change in the number of nucleotides. As anadditional example, collapsing a string of GGGGGG to GG may beidentified as a 6:2 change.

At 518, the collapsed output string is decoded using the encoding schemeto recover the digital information. The technique for decoding thecollapsed output string may use the length of the homopolymer identifiedat 516 as one piece of information about the output string that isprocessed by a decoding pipeline. The decoding may be performed by theencoding module 212 or by an encoding and/or decoding module on anothercomputing device besides the computing device 204 that provided thenucleotide sequence string.

Illustrative Computer Architecture

FIG. 6 is a computer architecture diagram showing an illustrativecomputer hardware and software architecture for a computing device suchas the computing device 204 introduced FIG. 2 . In particular, thecomputer 600 illustrated in FIG. 6 can be utilized to implement theoligonucleotide synthesizer control module 206 or the encoding module212 introduced in FIG. 2 .

The computer 600 includes one or more processing units 602, a memory604, that may include a random-access memory 606 (“RAM”) and a read-onlymemory (“ROM”) 608, and a system bus 610 that couples the memory 604 tothe processing unit(s) 602. A basic input/output system (“BIOS” or“firmware”) containing the basic routines that help to transferinformation between elements within the computer 600, such as duringstartup, can be stored in the ROM 608. The computer 600 further includesa mass storage device 612 for storing an operating system 614 and otherinstructions 616 that represent application programs and/or other typesof programs such as, for example, instructions to implement theoligonucleotide synthesizer control module 206. The mass storage device612 can also be configured to store files, documents, and data such as,for example, sequence data that is provided to an oligonucleotidesynthesizer 208 in the form of instructions.

The mass storage device 612 is connected to the processing unit(s) 602through a mass storage controller (not shown) connected to the bus 610.The mass storage device 612 and its associated computer-readable mediaprovide non-volatile storage for the computer 600. Although thedescription of computer-readable media contained herein refers to a massstorage device, such as a hard disk, solid-state drive, CD-ROM drive,DVD-ROM drive, or USB storage key, it should be appreciated by thoseskilled in the art that computer-readable media can be any availablecomputer-readable storage media or communication media that can beaccessed by the computer 600.

Communication media includes computer-readable instructions, datastructures, program modules, or other data in a modulated data signalsuch as a carrier wave or other transport mechanism and includes anydelivery media. The term “modulated data signal” means a signal that hasone or more of its characteristics changed or set in a manner so as toencode information in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, radiofrequency, infrared and other wireless media. Combinations of any of theabove should also be included within the scope of computer-readablemedia.

By way of example, and not limitation, computer-readable storage mediacan include volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage of information suchas computer-readable instructions, data structures, program modules orother data. For example, computer-readable storage media includes, butis not limited to, RAM 606, ROM 608, EPROM, EEPROM, flash memory orother solid-state memory technology, CD-ROM, digital versatile disks(“DVD”), HD-DVD, BLU-RAY, 4K Ultra BLU-RAY, or other optical storage,magnetic cassettes, magnetic tape, magnetic disk storage or othermagnetic storage devices, or any other medium that can be used to storethe desired information and which can be accessed by the computer 600.For purposes of the claims, the phrase “computer-readable storagemedium,” and variations thereof, does not include waves or signals perse or communication media.

According to various configurations, the computer 600 can operate in anetworked environment using logical connections to a remote computer(s)618 through a network 620. The computer 600 can connect to the network620 through a network interface unit 622 connected to the bus 610. Itshould be appreciated that the network interface unit 622 can also beutilized to connect to other types of networks and remote computersystems. The computer 600 can also include an input/output (I/O)controller 624 for receiving and processing input from a number of otherdevices, including a keyboard, mouse, touch input, an electronic stylus(not shown), or equipment such as an oligonucleotide synthesizer 208and/or a polynucleotide sequencer 222. Similarly, the input/outputcontroller 624 can provide output to a display screen or other type ofoutput device (not shown).

It should be appreciated that the software components described herein,when loaded into the processing unit(s) 602 and executed, can transformthe processing unit(s) 602 and the overall computer 600 from ageneral-purpose computing device into a special-purpose computing devicecustomized to facilitate the functionality presented herein. Theprocessing unit(s) 602 can be constructed from any number of transistorsor other discrete circuit elements, which can individually orcollectively assume any number of states. More specifically, theprocessing unit(s) 602 can operate as a finite-state machine, inresponse to executable instructions contained within the softwaremodules disclosed herein. These computer-executable instructions cantransform the processing unit(s) 602 by specifying how the processingunit(s) 602 transitions between states, thereby transforming thetransistors or other discrete hardware elements constituting theprocessing unit(s) 602.

Encoding the software modules presented herein can also transform thephysical structure of the computer-readable media presented herein. Thespecific transformation of physical structure depends on variousfactors, in different implementations of this description. Examples ofsuch factors include, but are not limited to, the technology used toimplement the computer-readable media, whether the computer-readablemedia is characterized as primary or secondary storage, and the like.For example, if the computer-readable media is implemented assemiconductor-based memory, the software disclosed herein can be encodedon the computer-readable media by transforming the physical state of thesemiconductor memory. For instance, the software can transform the stateof transistors, capacitors, or other discrete circuit elementsconstituting the semiconductor memory. The software can also transformthe physical state of such components to store data thereupon.

As another example, the computer-readable media disclosed herein can beimplemented using magnetic or optical technology. In suchimplementations, the software presented herein can transform thephysical state of magnetic or optical media, when the software isencoded therein. These transformations can include altering the magneticcharacteristics of particular locations within given magnetic media.These transformations can also include altering the physical features orcharacteristics of particular locations within given optical media, tochange the optical characteristics of those locations. Othertransformations of physical media are possible without departing fromthe scope and spirit of the present description, with the foregoingexamples provided only to facilitate this discussion.

In light of the above, it should be appreciated that many types ofphysical transformations take place in the computer 600 to store andexecute the software components presented herein. It also should beappreciated that the architecture shown in FIG. 6 for the computer 600,or a similar architecture, can be utilized to implement many types ofcomputing devices such as desktop computers, notebook computers,servers, supercomputers, gaming devices, tablet computers, and othertypes of computing devices known to those skilled in the art. Forexample, the computer 600 may be wholly or partially integrated into theoligonucleotide synthesizer 208. It is also contemplated that thecomputer 600 might not include all of the components shown in FIG. 6 ,can include other components that are not explicitly shown in FIG. 6 ,or can utilize an architecture different than that shown in FIG. 6 .

Illustrative Embodiments

The following clauses described multiple possible embodiments forimplementing the features described in this disclosure. The variousembodiments described herein are not limiting nor is every feature fromany given embodiment required to be present in another embodiment. Anytwo or more of the embodiments may be combined together unless contextclearly indicates otherwise. As used in this document “or” means and/or.For example, “A or B” means A without B, B without A, or A and B. Asused herein, “comprising” means including all listed features andpotentially including addition of other features that are not listed.“Consisting essentially of” means including the listed features andthose additional features that do not materially affect the basic andnovel characteristics of the listed features. “Consisting of” means onlythe listed features to the exclusion of any feature not listed.

Clause 1: A method for synthesizing multiple polynucleotides havingdifferent sequences using a template independent polymerase, the methodcomprising: (a) incubating an array covered with a plurality ofinitiators for a reaction time at a reaction temperature with a reactionreagent solution comprising the template independent polymerase and anucleotide mixture of a selected nucleotide comprising unprotectednucleotides such that the template independent polymerase adds avariable number of nucleotides to 3′ ends of the initiators, thevariable number of nucleotides based on a ratio of the unprotectednucleotides to the protected nucleotides; (b) after the reaction time,stopping activity of the template independent polymerase; (c) deliveringa wash solution to the array to remove the reaction reagent solution;(d) removing the protecting group from a selected location on the array;and (e) iteratively repeating steps (a), (b), (c), and (d) until themultiple polynucleotides are formed.

Clause 2: The method of clause 1, wherein during iterations of repeatingsteps (a), (b), (c), and (d) the selected nucleotide and the selectedlocation both change at least once.

Clause 3: The method of any of clauses 1-2, wherein a target value forthe variable number of nucleotides is predetermined prior to theincubating.

Clause 4: The method of any of clauses 1-3, wherein the templateindependent polymerase comprises TdT.

Clause 5: The method of any of clauses 1-4, wherein the selectednucleotide is one of deoxyadenosine triphosphate (dATP), deoxycytidinetriphosphate (dCTP), deoxyguanosine triphosphate (dGTP), deoxythymidinetriphosphate (dTTP), adenosine triphosphate (ATP), cytidine triphosphate(CTP), guanosine triphosphate (GTP), or uridine triphosphate (UTP).

Clause 6: The method of any of clauses 1-5, wherein the protecting groupis a chemical group incorporated into the protected nucleotides and isremovable by light, heat, electrochemistry, or pH.

Clause 7: The method of any of clauses 1-6, wherein stopping theactivity of the template independent polymerase comprises changing anoxidation state of a metal cofactor for the template independentpolymerase to an oxidation state other than +2.

Clause 8: The method of any of clauses 1-7, further comprising adding aprimer sequence with a specific base-by-base sequence on to 3′ ends ofthe multiple polynucleotides.

Clause 9: A method of synthesis of multiple polynucleotides havingdifferent sequences using a template independent polymerase, the methodcomprising: (a) incubating an array covered with a plurality ofinitiators for a reaction time at a reaction temperature with a reactionreagent solution comprising the template independent polymerase and aselected nucleotide such that the template independent polymerase adds avariable number of nucleotides to 3′ ends of the initiators, thevariable number of nucleotides based on a ratio of the templateindependent polymerase to the initiators to the selected nucleotide; (b)activating polymerization at a selected location on the array; (c) afterthe reaction time, stopping activity of the template independentpolymerase; (d) delivering a wash solution to the array to remove thereaction reagent solution; and (e) iteratively repeating steps (a), (b),(c), and (d) until the multiple polynucleotides are formed, whereinduring iterations of repeating steps (a), (b), (c), and (d) the selectednucleotide and the selected location both change at least once.

Clause 10: The method of clause 9, wherein a target value for thevariable number of nucleotides is predetermined prior to the incubating.

Clause 11: The method of any of clauses 9-10, wherein the activating thepolymerization at the selected location comprises changing an oxidationstate of a metal cofactor of the template independent polymerase at theselected location, removing a protecting group from the templateindependent polymerase at the selected location, or adding the selectednucleotide at the selected location.

Clause 12: The method of any of clauses 9-11, wherein the activating thepolymerization at the selected location comprises inhibiting thepolymerization at locations on the array other than the selectedlocation by physically blocking the template independent polymerase, theselected nucleotide, or both from accessing the locations on the arrayother than the selected location during the incubating.

Clause 13: The method of any of clauses 9-12, wherein stopping activityof the template independent polymerase comprises heating the reactionreagent solution to at least 60° C., denaturing the template independentpolymerase with a surfactant, adding a chelator to the reaction reagentsolution, or changing an oxidation state of a metal cofactor for thetemplate independent polymerase to an oxidation state other than +2.

Clause 14: A method of decoding digital information in a polynucleotidesynthesized by a template independent polymerase comprising: receivingan output string from sequencing of the polynucleotide that encodes thedigital information according to a nucleotide sequence string generatedby an encoding scheme, wherein synthesis with the template independentpolymerase causes the polynucleotide to include a homopolymer that isnot present in the nucleotide sequence string; converting the outputstring to a collapsed output string by replacing the homopolymer in theoutput string with fewer nucleotides; and decoding the collapsed outputstring using the encoding scheme to recover the digital information.

Clause 15: The method of clause 14, wherein the nucleotide sequencestring generated by the encoding scheme excludes homopolymers and thefewer nucleotides is a single nucleotide.

Clause 16: The method of clause 14, wherein the nucleotide sequencestring generated by the encoding scheme includes homopolymers and anumber of the fewer nucleotides is based on a length of the homopolymer.

Clause 17: The method of clause 16, wherein the number of the fewernucleotides is further based on a variable number of nucleotides addedby the template independent polymerase during a single synthesis cycle.

Clause 18: The method of any of clauses 14-17, further comprisingidentifying a length of the homopolymer in the output string, whereindecoding the collapsed output string is based on a number of nucleotidesin the homopolymer.

Clause 19: The method of any of clauses 14-18, further comprisingsequencing the polynucleotide encoding the digital information therebygenerating the output string.

Clause 20: The method of any of clauses 14-19, further comprising:generating the nucleotide sequence string from the digital informationusing the encoding scheme; synthesizing the polynucleotide encoding thedigital information according to the nucleotide sequence string with thetemplate independent polymerase; and sequencing the polynucleotideencoding the digital information thereby generating the output string.

CONCLUSION

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts are disclosed as example forms ofimplementing the claims.

The terms “a,” “an,” “the” and similar referents used in the context ofdescribing the invention are to be construed to cover both the singularand the plural unless otherwise indicated herein or clearly contradictedby context. The terms “based on,” “based upon,” and similar referentsare to be construed as meaning “based at least in part” which includesbeing “based in part” and “based in whole,” unless otherwise indicatedor clearly contradicted by context. The terms “portion,” “part,” orsimilar referents are to be construed as meaning at least a portion orpart of the whole including up to the entire noun referenced. As usedherein, “approximately” or “about” or similar referents denote a rangeof ±10% of the stated value.

For ease of understanding, the processes discussed in this disclosureare delineated as separate operations represented as independent blocks.However, these separately delineated operations should not be construedas necessarily order dependent in their performance. The order in whichthe processes are described is not intended to be construed as alimitation, and unless other otherwise contradicted by context anynumber of the described process blocks may be combined in any order toimplement the process or an alternate process. Moreover, it is alsopossible that one or more of the provided operations is modified oromitted.

Certain embodiments are described herein, including the best mode knownto the inventors for carrying out the invention. Of course, variationson these described embodiments will become apparent to those of ordinaryskill in the art upon reading the foregoing description. Skilledartisans will know how to employ such variations as appropriate, and theembodiments disclosed herein may be practiced otherwise thanspecifically described. Accordingly, all modifications and equivalentsof the subject matter recited in the claims appended hereto are includedwithin the scope of this disclosure. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the invention unless otherwise indicated herein orotherwise clearly contradicted by context.

Furthermore, references have been made to publications, patents, and/orpatent applications throughout this specification. Each of the citedreferences is individually incorporated herein by reference for itsparticular cited teachings as well as for all that it discloses.

1. A method of synthesis of multiple polynucleotides having differentsequences using a template independent polymerase, the methodcomprising: (a) incubating an array covered with a plurality ofinitiators for a reaction time at a reaction temperature with a reactionreagent solution comprising the template independent polymerase and aselected nucleotide such that the template independent polymerase adds avariable number of nucleotides to 3′ ends of the initiators, thevariable number of nucleotides based on a ratio of the templateindependent polymerase to the initiators to the selected nucleotide; (b)activating polymerization at a selected location on the array; (c) afterthe reaction time, stopping activity of the template independentpolymerase; (d) delivering a wash solution to the array to remove thereaction reagent solution; and (e) iteratively repeating steps (a), (b),(c), and (d) until the multiple polynucleotides are formed, whereinduring iterations of repeating steps (a), (b), (c), and (d) the selectednucleotide and the selected location both change at least once.
 2. Themethod of claim 1, wherein a target value for the variable number ofnucleotides is predetermined prior to the incubating.
 3. The method ofclaim 1, wherein the activating the polymerization at the selectedlocation comprises changing an oxidation state of a metal cofactor ofthe template independent polymerase at the selected location, removing aprotecting group from the template independent polymerase at theselected location, or adding the selected nucleotide at the selectedlocation.
 4. The method of claim 1, wherein the activating thepolymerization at the selected location comprises inhibiting thepolymerization at locations on the array other than the selectedlocation by physically blocking the template independent polymerase, theselected nucleotide, or both from accessing the locations on the arrayother than the selected location during the incubating.
 5. The method ofclaim 1, wherein stopping activity of the template independentpolymerase comprises heating the reaction reagent solution to at least60° C., denaturing the template independent polymerase with asurfactant, adding a chelator to the reaction reagent solution, orchanging an oxidation state of a metal cofactor for the templateindependent polymerase to an oxidation state other than +2.
 6. Themethod of claim 1, wherein the activating the polymerization at theselected location comprises changing an oxidation state of a metalcofactor of the template independent polymerase at the selected locationto +2 and stopping activity of the template independent polymerasecomprises changing the oxidation state of the metal cofactor to anoxidation state other than +2.
 7. The method of claim 1, wherein one ormore of the initiators has a random sequence of nucleotides, has asequence of nucleotides that is cleaved by a restriction endonuclease,or has a sequence of nucleotides that includes a primer binding site. 8.The method of claim 1, wherein the ratio of the template independentpolymerase to the initiators to the selected nucleotide is 10 units oftemplate independent polymerase for each 10 pmol of initiators and 0.5mmol of the selected nucleotide.
 9. The method of claim 7, wherein thevariable number of nucleotides is about
 5. 10. The method of claim 1,wherein the reaction reagent solution comprises the selected nucleotideonly in unprotected form.
 11. A method of decoding digital informationin a polynucleotide synthesized by a template independent polymerasecomprising: receiving an output string from sequencing of thepolynucleotide that encodes the digital information according to anucleotide sequence string generated by an encoding scheme, whereinsynthesis with the template independent polymerase causes thepolynucleotide to include a homopolymer that is not present in thenucleotide sequence string; converting the output string to a collapsedoutput string by replacing the homopolymer in the output string withfewer nucleotides; and decoding the collapsed output string using theencoding scheme to recover the digital information.
 12. The method ofclaim 11, wherein the nucleotide sequence string generated by theencoding scheme excludes homopolymers and the fewer nucleotides is asingle nucleotide.
 13. The method of claim 11, wherein the nucleotidesequence string generated by the encoding scheme includes homopolymersand a number of the fewer nucleotides is based on a length of thehomopolymer.
 14. The method of claim 13, wherein the number of the fewernucleotides is further based on a variable number of nucleotides addedby the template independent polymerase during a single synthesis cycle.15. The method of claim 13, wherein a cutoff used for determining thenumber of the fewer nucleotides is a midpoint between one extensionlength and a next extension length.
 16. The method of claim 11, furthercomprising identifying a length of the homopolymer in the output string,wherein decoding the collapsed output string is based on a number ofnucleotides in the homopolymer.
 17. The method of claim 16, furthercomprising recording a ratio of the number of nucleotides in the outputstring that comprise the homopolymer and a resulting number ofnucleotides in the collapsed output string.
 18. The method of claim 11,further comprising sequencing the polynucleotide encoding the digitalinformation thereby generating the output string.
 19. The method ofclaim 11, further comprising: generating the nucleotide sequence stringfrom the digital information using the encoding scheme; synthesizing thepolynucleotide encoding the digital information according to thenucleotide sequence string with the template independent polymerase; andsequencing the polynucleotide encoding the digital information therebygenerating the output string.
 20. The method of claim 19, wherein theencoding scheme includes homopolymers and synthesizing thepolynucleotide comprises controlling extension length of a selectednucleotide to create a nucleotide sequence that encodes a homopolymer.